Skip to content

Conversation

@keewis
Copy link
Collaborator

@keewis keewis commented Oct 27, 2025

  • Closes #xxxx
  • Tests added
  • User visible changes (including notable bug fixes) are documented in whats-new.rst

Building on top of zarr-developers/zarr-python#3534, this is a draft PR that allows writing variable-sized chunks to zarr.

To see this in action, try:

# /// script
# requires-python = ">=3.11"
# dependencies = [
#   "xarray @ git+https://github.com/keewis/xarray.git@variable-chunking",
#   "zarr @ git+https://github.com/jhamman/zarr-python.git@feature/rectilinear-chunk-grid",
# ]
# ///

import numpy as np
import xarray as xr

rng = np.random.default_rng(seed=0)
values = rng.normal(size=(365, 20))

ds = xr.Dataset(
    {"a": (["time", "x"], values)},
    coords={"time": xr.date_range("2025-01-01", freq="d", periods=365)}
)
chunked = ds.chunk({"time": xr.groupers.TimeResampler(freq="ME"), "x": 10})

chunked.to_zarr(
    "variable_chunks.zarr",
    mode="w",
    safe_chunks=False,
    zarr_format=3,
    consolidated=False,
)

ds = xr.open_dataset(store, engine="zarr", chunks={})
print(ds.chunksizes)
# Frozen({'time': (31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31), 'x': (10, 10)})

At the moment, this requires safe_chunks=False because I didn't change the chunk alignment machinery, yet.

cc @d-v-b, @jhamman, @dcherian

@github-actions github-actions bot added topic-backends topic-zarr Related to zarr storage library io labels Oct 27, 2025
# while dask chunks can be variable sized
# https://dask.pydata.org/en/latest/array-design.html#chunks
if var_chunks and not enc_chunks:
if zarr_format == 3:
Copy link
Collaborator Author

@keewis keewis Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this check is probably not sufficient

@keewis keewis marked this pull request as draft October 27, 2025 16:34
@jhamman
Copy link
Member

jhamman commented Oct 27, 2025

We need zarr-python>=3, which doesn't work with @jhamman's fork because it doesn't have tags for versions above 3.0.0b2

I just pushed tags to my fork!

@keewis
Copy link
Collaborator Author

keewis commented Oct 27, 2025

thanks, I've changed the example back to using your fork

Comment on lines 304 to 315
if any(len(set(chunks[:-1])) > 1 for chunks in var_chunks):
raise ValueError(
"Zarr requires uniform chunk sizes except for final chunk. "
"Zarr v2 requires uniform chunk sizes except for final chunk. "
f"Variable named {name!r} has incompatible dask chunks: {var_chunks!r}. "
"Consider rechunking using `chunk()`."
)
if any((chunks[0] < chunks[-1]) for chunks in var_chunks):
raise ValueError(
"Final chunk of Zarr array must be the same size or smaller "
"Final chunk of a Zarr v2 array must be the same size or smaller "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct - it's unfortunately not as simple as "Zarr V3 supports variable-length chunking but Zarr V2 doesn't".

@jhamman
Copy link
Member

jhamman commented Jan 8, 2026

@keewis - zarr-developers/zarr-python#3534 is approaching a mergable state. Curious if you want to take another pass through this PR before we merge it and provide any feedback.

@keewis keewis added run-upstream Run upstream CI skip-ci labels Jan 8, 2026
@keewis
Copy link
Collaborator Author

keewis commented Jan 8, 2026

sure.

Do you know if there's a specific tell on whether rectilinear chunks are available? So far I've been using zarr_format == 3, but that's clearly not sufficient (and might even become wrong, should this ever be backported to the zarr 2 format). I can extend that by also checking if the RectilinearChunks and RegularChunks classes are exposed (see the most recent commit), but not sure if that's a sufficient check, either. Basically, what I'd like to check is whether the format extension is available.

I've posted a comment to the zarr PR (which doesn't seem to really affect the code here).

Finally, I still need to figure out how to change validate_grid_chunks_alignment, and we also need tests, both for reading and writing.

f"Variable named {name!r} has incompatible dask chunks: {var_chunks!r}. "
"Consider rechunking using `chunk()`."
"Consider rechunking using `chunk()`, or switching to the "
"zarr v3 format with zarr-python>=3.2."
Copy link
Collaborator Author

@keewis keewis Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still struggle with accurately expressing the prerequisites for rectilinear chunk support. Maybe this is fine, but we could also ask for "rectilinear chunk support"?

Suggested change
"zarr v3 format with zarr-python>=3.2."
"zarr v3 format with enabled rectilinear chunk support."

dask = { git = "https://github.com/dask/dask" }
distributed = { git = "https://github.com/dask/distributed" }
zarr = { git = "https://github.com/zarr-developers/zarr-python" }
zarr = { git = "https://github.com/jhamman/zarr-python", branch = "feature/rectilinear-chunk-grid" }
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert before merging:

Suggested change
zarr = { git = "https://github.com/jhamman/zarr-python", branch = "feature/rectilinear-chunk-grid" }
zarr = { git = "https://github.com/zarr-developers/zarr-python" }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

io run-upstream Run upstream CI skip-ci topic-backends topic-zarr Related to zarr storage library

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants